Phylogenomics by Christoph Bleidorn

Phylogenomics by Christoph Bleidorn

Author:Christoph Bleidorn
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


Of course the used scoring system is arbitrary, and a different one may support the choice of an alternative alignment. Especially the scoring of gap characters has been debated (Giribet and Wheeler 1999). Gaps have obviously to be introduced when aligning two sequences of different lengths. Gaps are resulting from a different biological process than mismatches. Whereas mismatches (mostly) trace back to mutations, gaps are the result of indels. Possible mechanism for indels are errors during DNA replication (e.g. slipped-strand mispairing), unequal crossing over during recombination or introduction of mobile elements (McGuffin 2009; Levinson and Gutman 1987). All these mechanisms usually result in the simultaneous insertion (or deletion) of sequences, which implies that multiple neighbouring gaps stem from a single event. Using a scoring system that treats all gaps independently would therefore introduce an over-penalization for them, as implicitly separate events would be assumed for their origin (McGuffin 2009). As a solution to this problem, the use of affine gap costs has been introduced. This type of penalty differentiates between opening a gap and extending it. For example, using gap opening costs of −1 and gap extension costs of 0.1 for ◘ Fig. 6.2 would result in a total score of +5.4, whereas the alignment in ◘ Fig. 6.1 remains at +3. Similarly, it is possible to introduce different scores for mismatches. For example, in case of aligning protein sequences, scores are usually based on matrices that incorporate the evolutionary preferences for certain substitutions over other kinds of substitutions. Widely used matrices are BLOSUM and PAM (Henikoff and Henikoff 1992). ◘ Figure 6.3 shows the BLOSUM62 matrix (Henikoff and Henikoff 1992), which is used by all BLAST searches (see below) on an amino acid level. Scores in these matrices are given as log-odds, which can be directly used as parameters of alignment scoring schemes. Positive scores mean that we find amino acid pairings more often than expected by chance (conservative substitutions); negative values indicate those occurring less often as expected (non-conservative substitutions) (Eddy 2004). Alternatively, a matrix counting the steps for amino acid substitutions inferred from the genetic code can be implied. In this case, costs are either −1 (one change in the codon triplet needed), −2 (two changes needed) or −3 (three changes needed). Obviously, choice of the scoring function and its parameters has a huge influence on selecting the best pairwise alignment.

Fig. 6.3BLOSUM62 matrix giving log-odd scores for each possible amino acid substitution derived from pairwise sequence alignments of at least 62% identity



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.